Practical Methods for Creating CDISC SDTM Domain Data Sets from Existing Data
نویسنده
چکیده
Creating CDISC SDTM domain data sets from existing clinical trial data can be a challenging task, particularly if the database was not designed with the SDTM standards in mind. A key step in the process involves determining which of the STDM domain datasets need to be produced for submission and then determining what conversion process will be necessary to produce them from the existing data. Adequate planning and documentation of the conversion process is an essential first step before programming begins. The basic component of the planning phase involves metadata mapping – determining how each of the variables in the existing data will relate to the variables contained in the SDTM domains to be produced. The documentation of the conversion process should be recorded in a format that facilitates efficient access by those involved in the planning, programming and validation phases of the conversion. Tools suited to the task of complex data mapping and data manipulation can significantly reduce cost and improve quality. This paper presents an example of a simple metadata mapping tool developed using SAS, Microsoft Excel and Visual Basic. The examples in this paper are based on the CDISC SDTM version 1.1, the SDTM Implementation Guide version 3.1.1 and SAS ® version 9.1.3. INTRODUCTION In order to increase the efficiency of the drug development process, the Clinical Data Interchange Standards Consortium (CDISC) has developed a series of clinical study data standards to facilitate efficient transfer, access and review of clinical trial data. These standards include the Operational Data Model (ODM), the Study Data Tabulation Model (SDTM) and the Analysis Data Model (ADaM). This paper presents basic strategies and practical methods for creating SDTM domain data sets from clinical data management (CDM) system files. Before initiating the data mapping and conversion process it is crucial to have a basic understanding of the SDTM specifications. CDISC provides implementation guides for all of the CDISC data standards on their Website (www.cdisc.org). The SDTM Implementation Guide (SDTMIG) is an essential tool for anyone involved with the metadata mapping or programming associated with the creation of SDTM data sets. The SDTM Implementation Guide contains the specifications and metadata for all of the SDTM data domains and guidance for producing SDTM domain files. The SDTM is an evolving standard and it is important to ensure that everyone involved in the conversion process is adhering to the same version of the SDTM. It is also important to understand the difference in the version numbers for the SDTM standard and the associated implementation guide. The most recent versions in production are SDTM 1.1 and SDTMIG 3.1.1, which were released in 2005. CDISC SDTM OVERVIEW The purpose of creating CDISC SDTM domain data sets is to provide Case Report Tabulation (CRT) data to a regulatory agency, such as the FDA, in a standardized format that is compatible with available software tools that allow efficient access and correct interpretation of the data submitted. The SDTMIG provides documentation on metadata for the domain data sets that includes the file name, variable names, types, labels, formats, roles and controlled terminology. While most of the SDTM domain data sets have a normalized (vertical) structure, they were not designed for use in a clinical data management (CDM) system. It is highly desirable to incorporate CDISC standards to the extent practical when designing CDM data structures. Proper adherence to the standards can greatly reduce the effort necessary for data mapping. Important standards to adhere to are domain name, variable name, variable type and format. Matching the SDTM variable labels is not important. The SDTM standard labels are available in the standard metadata and the labels are not used for match merging in the mapping process. While the SDTM documentation does not specify variable lengths, it is highly desirable to maintain consistency in length among variables with the same name across domains and between studies. While the SDTM data sets do contain some derived variables, they are not designed for use as analysis data sets. Adherence to the ―one proc away‖-philosophy for analysis files dictates the addition of additional derived variables and conversion to a horizontal structure. The SDTM data sets can however, be used in the creation of analysis files. The creation of standardized STDM data sets will aid in the creation of analysis files for each individual study, and the future task of integrating data from multiple studies will be accomplished with greater efficiency and quality. The ability to submit SDTM data sets in place of listings or patient profiles, resulting in additional cost reductions. Pharma, Life Sciences and Healthcare SAS Global Forum 2008
منابع مشابه
Strategies and Practical Considerations for Creating CDISC SDTM Domain Data Sets from Existing CDM Data Sets
Creating CDISC SDTM domain data sets from existing clinical trial data can be a challenging task. However if the process is well planned and properly managed, successful results can be obtained in an efficient manner. Standardizing and mapping data elements from a form well suited to use with a clinical data management system to a form suitable for tabulation and review has many aspects to cons...
متن کاملADaM or SDTM? A Comparison of Pooling Strategies for Integrated Analyses in the Age of CDISC
With the two CDISC Standards, Study Data Tabulation Model (SDTM) and Analysis Data Model (ADaM), there are basically two different pooling strategies of single study data possible for an integrated analysis. The first one is to pool the SDTM data sets and to derive the analysis data sets from the pooled SDTMs. The second one is to pool the single study analysis data sets and make all needed der...
متن کاملExtracting insights from the shape of complex data using topology
This paper applies topological methods to study complex high dimensional data sets by extracting shapes (patterns) and obtaining insights about them. Our method combines the best features of existing standard methodologies such as principal component and cluster analyses to provide a geometric representation of complex data sets. Through this hybrid method, we often find subgroups in data sets ...
متن کاملBreaking the Mold: Clinical Trials Data as RDF
After more than a decade since the implementation of CDISC SDTM as the standard for clinical trials data exchange, our industry continues to struggle with significant implementation challenges: [a] standards non-conformance resulting in a high incidence of rejection criteria for submissions (1). [b] Costs converting between versions. [c] Limitations of the two-dimensional format and lack of int...
متن کاملComparing Medical Comorbidities Between Opioid and Cocaine Users: A Data Mining Approach
Background: Prescription drug monitoring programs (PDMPs) are instrumental in controlling opioid misuse,but opioid users have increasingly shifted to cocaine, creating a different set of medical problems. Whileopioid use results in multiple medical comorbidities, findings of the existing studies reported singlecomorbidities rather...
متن کامل